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Abstract 

In the process of item calibration for a CAT test, many well-established calibrating 
packages show weakness in the estimation of item parameters. This paper introduces an on-line 
calibration algorithm based on the convexity of likelihood functions. This package consists of: 
(a) an algorithm that estimates examinee ability, and (b) an algorithm that estimates the 
parameters for a new item that is seeded into the CAT test. The performance of the new package 
is comparable with BilogMG, and in some cases exceeds it. 

Key Words: computerized adaptive testing, CAT, item calibration, item parameters, 
maximization of likelihood, log-likelihood function, precision, BilogMG, DMAP, ICCs, multi- 
dimensional test, convexity. 
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Application of Direct Optimization for Item Calibration 
in Computerized Adaptive Testing 

1 . Introduction 

The problem of item calibration-estimation item parameters when the model of 
responses is fixed— is very old and has been well discussed in the psychometric literature (e.g., 
Bock & Aitkin, 1981; Thissen & Steinberg, 1984; Samejima, 1969; Levine, 1984). There are a 
few packages available, particularly BilogMG, which are designed to do the job of calibration 
(Bilog 3, 1990; Multilog, 1988). However, nearly all available packages and algorithms are 
designed to use results of tests given in the classical paper-and-pencil mode. Typically, test 
results from a computer-adaptive-testing mode very often contradict assumptions underlying 
typical calibration packages, and application of those packages generally leads to large biases 
and standard errors in item-parameter estimates for a computer-adaptive test. 

These constraints have recently been addressed in work with the Armed Services 
Vocational Aptitude Battery (ASVAB) computerized adaptive testing mode (CAT) which uses a 
seeded-item design (Segall, Moreno, Bloxom, & Hetter, 1997) to get parameters for new items. 
The CAT- ASVAB seeded-item program allows access to an unbiased examinee population and 
adds little additional cost to the ongoing operational testing. However, in CAT testing, the matrix 
of examinee-by-item responses is rather sparce, in comparison with the classical paper-and- 
pencil test. The CAT tests are rather short (at most 15 items) because of computerized- 
adaptation to each examinee, and the examinee population is sometimes considerably different 
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from a normal-normal population (i. e. ability distribution of population is normal with mean 
zero and standard error one). Moreover, there is an obvious violation of the single dimension 
assumption for at least one test, the General Science test (Zimowski, 1987). Therefore, we have 
developed an algorithm based on likelihood optimization, which is a parametric algorithm type 
of EM (McLachlan, 1997) and is not marginal; so, it belongs (Baker, 1992) to the class of Direct 
Maximization Aposteriori algorithms (DMAP). In this paper we will describe the new algorithm 
and compare it with adjusted BilogMG, the most widely used parametric calibration package. 

The DMAP algorithm begins by estimating examinee ability based on the test results 
(Krass, 1 997). Utilizing this estimate, it then estimates the 3PL parameters of the seeded item. 
Next, DMAP re-estimates examinee ability and continues this process to convergence with the 
required precision. Thus, we see that DMAP, as a usual calibrating algorithm, is an algorithm of 
the EM type. In this paper we will describe estimating examinee ability by DMAP and then 
estimating seeded item parameters by DMAP, and we will present some simulation results to 
compare the performances of DMAP and BilogMG. 



2. Estimation of Examinee Ability 



Let our test consist of I items, with Item Characteristic Curve (ICC) P,{0), / = 1, — , / 
being 3PL ICC, i.e., 



P,m^e,+ 



i z£i 

1 + exp(/,. (0 )) ’ 



0) 
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where l j (0) = -D-a i ■(& -b : ) , and a, , 6, , c, ; / = 1, • • • , 7 the item discriminating, difficulty, and 
guessing indexes, correspondingly; 0 is the latent ability of an examinee and D = 1.7 is a 
scaling constant (Lord, 1977). We assume that examinee ability 0 e [0 min ,0 max ] , which means 
that the optimization described below should be done as a constrained optimization (a feature 
which cannot be done with an internal algorithm type such as Newton-Raphson). Typically, in 
CAT-ASVAB we have agreement # min = -3.0 and 0 mm = +3.0 . Let our examinee get a sequence 
{/, ,; 2 , •••,/*} of items generated by CAT, where k < K , and K is the length of the CAT-ASVAB 
test (usually 10 </l< 15). Remember, the CAT-ASVAB test is totally driven by an information 
table based on an item pool with a rather large exposure control factor (about 0.7) (Hetter & 
Sympson, 1997). CAT-ASVAB items are multiple-choice items, so the examinee produces a 
dichotomous answer sequence u k = {m, , u 2 , • • • , u K } . Then, his or her likelihood function after the 
first k items of the test is: 

m , e) = g (0) ■ n p, (or ■ a ' (2) 

/=i 

where Q, (0 ) = 1 - P t ( 0 ) and g(0 ) is the density of prior ability distribution in the population 
of examinees. The value 0 k which maximizes likelihood 

£(“*>#*)= max £(«*>#) 

is considered to be the best estimator of the examinee’s ability after the first k items of the test. 
As usual, we assume that prior ability distribution is normal N{/u,cr ) , i.e., 
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g(0) = 



<j4ln 



exp(- 



2-cr 2 



•) , where // and cr are the mean and SE of prior distribution. 



Typically, we begin from normal-normal prior N(§,\) and then tune n and a to get better 
convergence. If we begin from jV( 0,1) , to get the maximizing 9 k we consider log-likelihood 
which has its derivative due to (1) as: 






where 






(1 ~c t )- exp(/, (0))D- a i 



(1 + exp(/, (#)))• (1 + c, • exp(7, {9 ))) 
[(1 + exp(/, (#))) ’ f ° r Ui = ° 



; for Uj = 1 



(3) 



To find a zero of log-likelihood derivative, in the case when the log-likelihood maximum 
is reached inside of domain segment [# min ,0 max ], we must solve the “fixed-point” problem for 



k 

function ^-^, (0) , i.e., find a solution of the equation: 

i=i 



e=±Rxe) 

/=! 



(4) 



Solution of this type of equation is heavily studied in computational mathematics literature 
(Blum, 1972; Ramsay, 1975), but the fastest solution can be reached in the case of monotone 
functions R, (9) which we have here. From (3) it follows that, in the case of w, = 1 , we have 
R j (0)> 0 , and (0 ) -> 0 if 9 -> ± co . On the other hand, in the case of w, = 0 , the function 
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Ri (0 ) < 0 , and R, (0 ) — > 0 if 0 — > - oo , and i? ( (0 ) — > -£) • a, if 0 — > + oo . Thus, 

* * 

9 > Z ) for 0 = # max , and 0 < Z /?, (0 ) for 0 = # min , if # max is large enough and 
y=i y=i 

0min is small enough. Therefore, depending on whether the answer is right or wrong, the first 
zero of (3), which defines the DMAP estimation of examinee ability after the first item 
administered by CAT, can be found by dichotomy from the “right side” if the answer is correct, 
or “left side” in the opposite case. Under right side, we mean beginning the process of checking 
if the inequality 

^max^Z^max) (5) 

;= 1 

, , , x . d\og(L(u k ,0)) 

holds. From (5) it follows that in the case, when (5) holds, the derivative of Jq is 

negative in all our domains, so the maximizing latent ability 0 X = 0 mXn ; in this case the process 
can be continued to the next item. If the above inequality is not true, we check the left side 

k 

condition 0 min > ^R j (0 mi n ) to see if maximization is reached on the right border of the 

/= 1 

domain. After checking borders we are sure that at least one solution of (4) is inside the segment 
[0 mi „ >0max ] > and it can be found by the following dichotomy process: Let 0 min = 0 mio and 

^max = 0max define 0 = 0 mXn + 0.5 • (<9 max -0 min ).lf 0 < Z R ij (0 ) > then 0 mm = 0 and 0 max = 0 

7=1 

in the case of opposite inequality. The process continues until 0 max - 0 min > S, where S is a 
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given precision of computation. The algorithm converges with speed , where is n is the 
number of iterations. 

As it is shown by Samejima (1973), the log-likelihood function (2) is not, generally 
speaking, uni-modal, so (4) can have more than one solution, but the second solution is usually 
out of the border of the “normal” domain. In the case of our algorithm, even though it is designed 
to hunt for more than one solution of (4), after more than 1,000,000 applications of the algorithm 
to the simulated or real life test situation, we were not able to find a second solution of (4) in the 
considered domain [-3.0, +3.0]. 

From the properties of (3) it follows, independently of the first answer, if the answer on 
the second item is correct, the root of the equation (4) will be moved to the right, and it can be 
found by dichotomy beginning from the right side. If the answer on the second item in the 
sequence is wrong, the root of (4) will be moved to the left, and it can be found by dichotomy 
from the left side. This phenomena is due to the property R t (0) > 0 in the case of a correct 
answer, and R , (0) <0 in the case of a wrong answer. This phenomena reduces the domain of 
searching of maximizing likelihood ability while the test is developing adaptively. 

In Figure 1, we present the case of a test where the first item is answered correctly and the 
second wrongly. The darker curve corresponds to the function R ] ( 0 ) for the first correct answer, 
and the lighter curve corresponds to the summation R x (9) + R 2 ( 0 ) for the first two items when 
the first was answered correctly and the second wrongly. The intersection of the straight line and 
the graph of the function R x (9) + R 2 ( 0 ) gives the DMAP estimation of theta for the test length of 
two. 
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(Figure 1 about here.) 

In the current CAT-ASVAB, the Owen-Baysian algorithm (Owen, 1975) is applied to 
estimate ability of the examinee “on-the-fly,” and the Baysian-Modal (Segall, et al., 1997) 
algorithm is applied to the total test sequence to make the final tuning in ability examinee 
estimation. The above described DMAP algorithm requires a little bit more computer time (about 
1 .5 more), but it gives more precision in the estimation of examinee ability in the densest part of 
the ability distribution. 

The results of a simulation for 3,000 examinees for Arithmetic Reasoning in CAT- 
ASVAB Form 1, where the size of the item pool is equal to I = 94 , is shown in Figures 2 and 3. 
In this simulation experiment, we took 3,000 normal-ability-distributed examinees and 
“recovered” their known “true” ability by standard Baysian methods (Figure 2) and by the 
DMAP algorithm (Figure 3). 

(Figures 2 and 3 about here.) 

As we can see, DMAP has about the same precision (in the sense of SE or maximum- 
minimum deviation) as a standard Baysian algorithm for 6 > -1.85 but does better than the 
standard from 6 > -1 .05 . In the area of ability 6 < -2.00 , where guessing is a decisive factor for 
examinees, DMAP typically loses to the standard Baysian methods, but there is not a large 
population in that ability area. 
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In this section we will demonstrate the implementation of the DMAP algorithm for 
obtaining 3PL ICC parameters on unknown (seeded) items, assuming that the ability of 
participating examinees has already been estimated. We will present our ICC functions, given by 
(l),as P (a^b^CiX#) ; / = 1, •••,/ to emphasize dependence on item parameters. There is a new 
(7 + 1) -th item with unknown parameters which is called a CAT seeded item; it is usually given 
to an examinee in the second, third, or fourth (random) position of his or her exam. If the CAT 
test is given to M examinees with abilities 0 m \ m = 1, . . . , M , then the joint likelihood of the 
response vectors can be written as 



M 



£+1 






) 



m = 1 



;=1 



>( 6 ) 



where u m = (u jm );/' = 1, • • • , I + 1 is the binary vector of responses for examinee m = 1, • • • , M on 
the test, including the seeded item. In expression (6) we took into account that the length of the 
test is increased to ( K + 1) due to the inclusion of the seeded item. Relation (6) can be rewritten 
in the form: 



M 

L = L (r Y\(P(a,b,c(6 m )f : ■(Q(a,b,c)(0J) < '- n: ' 1 

m~ 1 

where (a ,b ,c ) = (a /+l ,6 /+l ,c /+l ) are the item parameters of the seeded item, and u m is the 

response of m -th examinee on the seeded item in the test. Also L 0 here is the joint likelihood of 
the test without the seeded item. 

To estimate item parameters for the seeded item, we must solve the problem of 
maximization of log-likelihood of (7), i.e., find a solution to the problem: 
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M 



In L = In L 0 + X (M* ‘ In P{a , b , c )( G m ) + ( 1 - u m ) • ln( 1 - P(a , b , c )( 6 J) => max , (8) 



m~\ 



where (a , b , c ) e [a min , a mM ] * [b min , b mta ] * [c min , c max ] . The value of border segments such as 
a min » a max » • • • for different parameters are user-defined for a test, as in the case of ability 
estimation. We are interested in constrained maximization on the given parallelepiped-domain. 
The DMAP algorithm described below will check the border of this domain parallelepiped 
before going to the internal point. But if we assume the maximizing solution in (8) is reached on 
an inside point of the domain, we must find a solution of equalities: 



d In L - d \nL - d \nL 

( a,b,c ) = - - ( a,b,c ) = — ^—(a,b,c) = 0. 



da 



db 



dc 



(9) 



Then, from (9), we will have: 



d In L -M, u 



1-M* 



P(a,b,c)(0 m ) \-P(a,b,c)(G m y dc 



dP - 
)>—{d,b,c){G m ) 



However, from definition (1), the function 



dP 

~—(a,b,c)(G ) does not depend on c . Using this 
dc 



fact, we have: 



d 2 \nL 

d 2 c 



-I-<- 






P 2 (a,b,c)(GJ (1 -P(a,b,c)(GJ) 2 ' K dc 



1 - m * dP _ _ _ 

+ - — _ . ^ ~ (a,b,c)(G J) 2 < 0 . 



From this we can state that for fixed parameters (a,b ) , the function In L(a,b,c) is convex on c , 

and the function (a,b,c) is monotone, decreasing on c . As in the case of estimation of 

dc 



ability, if In L(a,b,c) is not reaching maximum on the border of the segment \c . ,c 1 , its 

® l min * max J * 
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^1 

maximum is reached in the root of the function ( a,b,c ) which can be found by a 

dc 

dichotomy process. Below, we describe in more detail how this work could be done in our case. 
Let’s introduce a function F m = 1 + exp(t/ a-(0 m -b))\ m = 1, • • • , M , then 

d P(a,b,c) 1 c - 1 

~z (r m ) = P(a,b,c)(0 m ) = 1 + . After some algebra we will have: 

v° A. F„ 



dlnL & 



dc 



K+c - 1 l-c J ti K+c-l 1 -c m 



( 10 ) 



where N is the total number of wrong answers on the seeded item in the test. If N = 0 , i.e., 

there are no wrong answers, u m = 1 , m = for c = 1 (case of “perfect guessing”), we will 

, dlnL ^ 1 . din L 

have — — = > — > 0 , which, due to monotone decreasing nature of function , means 

vc m - 1 A dc 



dlnL 



that > 0 for all c , and so the log-likelihood function In L is monotone, increasing 



function and reaching maximum on the right end c = 1 . If N > 0 so there is examinee m 0 such 



dlnL 



that Ai 0 — 0 , then — > -oo when c — > 1 and behavior of the function InZ, depends on 



, , d In L 

the behavior — - — on the left end c = 0 . If c = 0 : then 
d c 



dlnL £ . 1 JL , F "m* 

dc f Z\~ N = lL u m p _i~ ^ = M 

C m= 1 r m 1 m - 1 ” m * m= 1 *m 



( 11 ) 



M 



where P m = P(a,b,c)(0 J . From (1 1) it follows, if - M < 0 , then the likelihood 

m= 1 *m 



function is monotone, decreasing and reaching maximum on the left end c = 0 . If 
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y 1 , ~r~ — M> 0 , we will have one root for function (a, b , c) which can be found by 

m= 1 “m dc 

dichotomy. This root c = c(a,b ) will provide the searching maximum likelihood. Utilizing this, 
we implement a search through the dense net of points (a j,bj), j = 1, ■ • • • , N , where 

(Pj ,bj)e Ax B, computing the likelihood Z,(a, , b } , c{p j , b } )) and getting approximate 
maximization, for which the precision depends on the density of the net. This search can be 
considerably decreased if we use a convexity of the function L(a,b,c(a,b )) on b for fixed 
<* e [ a mm » a max] (provided in the Appendix) under some approximation. Again, after more than 
1,000,000 experiments, we can state that this approximation is holding in our case, i.e., the 
function L(a,b,c(a,b )) is convex on b . 

4. Comparing performances DMAP and BilogMG 

Comparing the performance of the DMAP algorithm with the BilogMG algorithm is done 
through a set of simulations, but first the BilogMG package must be adjusted to get a reasonable 
performance. As we have explained, the matrix of responses for a CAT test is rather sparce. 
Further, items with low information are used very rarely, and items with high information are 
used too often, giving very non-uniform filling of the response matrix. As result, BilogMG very 
often leads to a non-convergence run, providing parameters too far from reality. To avoid this 
inconsistency, we run BilogMG in two stages. First, we simulate a paper-and-pencil test for our 
set of M examinees on items that belong to the CAT-ASVAB item pool. Then we run BilogMG 
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and save the result of item pool estimation with help of the BilogMG “SAVE” statement. After 
that, we run BilogMG for the data obtained from the simulated or real CAT-ASVAB test, with 
the seeded item included, using the preliminary estimation through the “IFNAME” subcommand 
in the “GLOBAL” statement in BilogMG. With this approach, BilogMG converges and provides 
a rather reasonable and stable estimation on the population of examinees with normal distributed 
abilities. After much experimentation, we are decided to use 30 quadrature points in the marginal 
estimations for BilogMG. 

To compare performances in the “normal” situation, we use three typically representative 
items from the item pool for AR: an “easy” item ( a,b,c ) = (1.17,-1.63,0.13) , a “normal” item 
(a,b,c) = (1.3,0.12,0.15) , and a “hard” item ( a,b,c ) = (1.23,1.63,0.07) . All three items are rather 
informative in their areas of difficulty. Then, for each item we run the CAT-ASVAB test 
simulation twenty times for M examinees, changing random seeds each time to generate 
different response matrixes. In every run we use DMAP and adjusted BilogMG to re-estimate 
item parameters for the above described items. We found that both packages provide unbiased 
parameter estimation; the major differences are in the precision of those estimations. 

First of all, we run our simulation for a different number of examinees with normal- 
normal distribution of their abilities, changing examinee number as: 

M e {300, 500, 750, 1000, 1500, 2000} . In this experiment, we try to identify the number of 
examinees needed to provide estimation of parameters which are most precise. In Table 1 we 
show estimation of SE for three parameters in our experiment. 

(Table 1 about here.) 
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These results are graphically shown in the Figure 4 (Graphs depict variances of parameters 
estimation for a, b and c correspondingly). 

(Figure 4 about here.) 

As we can see, DMAP requires at least 1,000 examinees per test to get variances in a and 
b parameters comparable with BilogMG, and BilogMG is always better in the estimating of 
parameter c . However, the last advantage (more precise estimation of parameter c ) disappears if 
we measure weighted average distances between “true” ICCs of studied items and ICCs built 
with estimated 3-PL parameters. Here, under weighted distance between two ICCs curves, we 
mean 

D- Ji> y ■(P(a,b,cX» J )-P(a,b,cXe j ) , 

where ( a,b,c ) is the estimation of “true” parameters ( a,b,c ) by some package in a particular 
simulation experiment; 0j,j = 1, ... ,50 are equidistant points in ability domain [-3.0, 3 .0] , and 

T 

weights are normally distributed, i. e. Wj e A(0,1); m>. = 1 . In Table 2 and Figure 5 we show 

j = i 

that, from the point of view of distances between ICC curves, both algorithms perform more or 
less equally, in spite of the fact that BilogMG approximates the guessing parameter c better than 
DMAP. 

(Table 2 and Figure 5 about here). 

This is because the influence of guessing parameter is strong where the density of the 
examinee population is small. From this simulation experiment, we see that the performance of 
both packages is about the same for M = 1,500 , which we will assume in all further simulations; 
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we also begin calibration of a seeded item when the number of examinee answers on it is about 
1,500 in “real” on-line calibration with CAT-ASVAB. 

In the case of the CAT-ASVAB, very often we have violation of normality in examinee 
ability distribution due to seasonal and geographical location differences. To simulate this 
situation we consider two types of artificial populations. In the first type, we mix 750 examinees 
with normal-normal ability distribution with 750 examinees with ability distribution 
V(-0.8, 1.0) . After mixing, we get not-normal ability distributed population of examinees with 
mean of ability equal -0.4 and SE equal » 1.15 . We call this population “less able” (to the test). 
In the same mode, we make an “more able” population with mean +0.4 and the same SE « 1 . 1 5 . 
In both cases, we apply previously described simulation for the same three items of CAT- 
ASVAB Form 1 AR. We find the variances of estimation of 3-PL parameters are about the same 
as for the normal case (described above); the main differences are in biases of parameter 
estimations. Those biases are shown in Figure 6. 

(Figure 6 about here.) 

As we can see, BilogMG begins to be significantly biased in estimation of difficulty parameters, 
overestimates them for the “less able” population, and underestimates for the “more able” 
population. As a result, the average weighted distance between estimated ICCs and “true” ICCs 
significantly increases for BilogMG (Figure 7). On the other hand, the bias increases for DMAP 
are not significant with respect to the normal case. 

(Figure 7 about here.) 

As we have mentioned, there is one CAT-ASVAB test that is essentially not one- 
dimensional, General Science, which consists of three subtests: Physical Science, Biological 
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Science, and Chemical Science. To simulate the application of this test, we assume that every 
simulee has three abilities for every subtest, which are normal-normal distributed but highly 
correlated with a coefficient of correlation equal 0.8. Thus, the matrix of correlation for General 



Science abilities in this population looks like R _ 



^ 1.0 0.8 0 . 8 ^ 

0.8 1.0 0.8 

.0.8 o.8 i.oj 



We would like to get a three- 



dimensional ability vector 9 = (0,,0 2 ,0 3 ) such that every component of it will have a normal 
distribution with mean 0, and the correlation matrix between components will be equal R . To do 
this, we make a Cholesky decomposition of R , i.e., present it in the form R = A T * A where 



A T matrix transposes to matrix A , the square root of R and A = Q*diag( y fX~ ) where Q is a 
three-dimensional orthogonal matrix. In our case A ,= A 2 = 0.2 and A 3 = 2.6 , and 



Q = 



_J l_ _L > \ 

■Ji -JZ -Ji 

_J_ _!_ _L 
V2 V6 V3 

o _-2- -L , 



Then, if vector 9 = {9 x ,6 2 ,9 3 ) consists of three independent identically 



distributed components belonging to N(0,1) , vector 9 = 9 * A T = (0, , 9 2 , 9 3 ) will have the 
desired multi-dimensional distribution (Bickel & Doksum, 1977). Thus, if a simulee gets a 



Physical Science item, we use 0, ability to get the response for that item; if Biological, we use 



9 2 ; and if Chemical, we use 0 3 . 

In this three-dimensional situation, we choose for simulation three representative items 
for each science: one “easy” item b < -1.4 , one “normal” item - 0.3 < b < -0.3 , and one “hard” 
item b > 1.7 (altogether we choose nine items for the General Science test). As before, we run the 
simulation twenty times, changing random seeds and using 1,500 simulees in every run. Our 
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results show that both packages are not significantly biased in parameter estimation, but there are 
increases in variance estimation, compared with a one-dimensional test. These increases are 
shown in the Figure 8. 

(Figure 8 about here.) 

As we can see, the largest and most significant increase is in the variances of estimating 
difficulty parameters by BilogMG. Further, with BilogMG, we have a significant increase in 
weighted distance between the estimated and “true” ICCs, especially for “normal” items (Figure 
9). On the other hand, the increase in the variances of estimating difficulty parameters by DMAP 
are not significant relative to the normal case. 



(Figure 9 about here.) 
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5. Conclusion 

We have demonstrated that the above described DMAP algorithm has about the same 
precision as the BilogMG algorithm in calibrating items from the CAT-ASVAB seeded design. 
More than that, in “special” circumstances, such as the absence of normality in prior distribution 
of examinee ability or the multi-dimensionality in item content, BilogMG loses its precision, but 
DMAP does not. This is because BilogMG is a marginal algorithm, with normality, to some 
extent, built in by the application of computation joint distribution through quadrature points. 

The other weak part of BilogMG is the application of only the Newton-Raphson algorithm as 
the main engine for local sub-optimization. As we have already mentioned, this tool will not 
pursue constrained optimization. However, from the point of view of maximization of joint 
likelihood, BilogMG and DMAP use different types of heuristics, so their solutions in different 
initial circumstances can be better or worse, depending on many “internal” conditions. Therefore, 
for real on-line calibration of CAT-ASVAB seeded items, we run both packages and choose the 
best solution by % 2 evaluation. 
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Appendix 

Convexity by Other Parameters 

As we show, for fixed (a,b) the log-likelihood function In L(a,b,c) is convex on c and 
reaches its maximum inside the prescribed segment [c min , c max ] or on its border. We now 
consider the case when the function In L(a,b,c) reaches its maximum on c inside the above 
domain-segment. In this case there is a function c = c{a,b ) such that 

3 In L(a,b,c(a,b)) f\ 

Tc = U • (12) 

Because all considered functions are analytical under some regularity conditions (Kantorovich, 
1968), the function c = c(a,b ) is also analytical, so it has all the derivatives. Let us present our 
3PL function in the form: 

P(a, b, c){0) = c + (1 - c) • P 0 (a, b){0 ) , (13) 



where Pq (#> &)($) i+exp (L(a,b,&)) > i- e -> Po (P* ^)(*^) is a 2PL ICC in the considered case 

(Here L(a,b,0) = Da(0-b)). Using (13) we can rewrite identity (12) in the form: 



31n L(a,b,c(a f b)) _ 
dc 



M 



~ 2 ( P(a,bAa,b)W m ) \~P(a,b,c(a,b))(0 m ) )0 ^0 ( a » ^)(^» )) . 



m= 1 



(14) 



Then for the derivative of \nL(a,b,c(a,b)) with respect to b we have: 



M 



ain z,(q, 6 ,c(a, 6)) _ V ( u m 

db V P(a,b,c(a,b))(0 m ) 1 -P(a,b,c(a,b))(0 n 

m = 1 



)>' 



db and 
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d_ In T(a 

db 



M 



,b,c(a,b )) _ V'* ( 

b 2 “ Zj V 

m=l 
M 

I V / “m 1~“». 

/ J (a^,c(a^))(^ m ) l-/>(q,6,c(q,6))(0 m ) 



+ 



l-«» 



(/ , (q,6,c(q,6))(0 m ))‘ : (1-P(q,6,c(q,6))(0 m ))' 






8 f(a.t.c(a,fc))09„) ' | 2 

a* ' 



) 



m= 1 



a 2 />(q,5,c(q,6))«? m ) 

db 2 



The first sum in this expression has a negative value. To work with the second sum, let us 

... . . , . . . . g 2 />(q,Mq,5))«?) 

consider the expression for the second derivative pp . Taking a derivative of (13) 

we have: 



ens^ym = M|il . (1 _ p o (a,b)(0)) - (1 - c) 



From which expression we get: 



D-a 



(l+exp(L (q,6 ,0))) 2 



ifp(.a,b,c(M,b)m = ^*1 . (1 _ P(a,b)(6)) + 2 • r 

3* 55 v ov /v /y 5* (l+exp(Z(q,6,0))) 2 



+ 2-(l-c(M)) 3 

V V ’ JJ (l+exp {L{a,b,0))f 



(15) 



5 P o (a,b)(0) _ 

Here we utilize expression db ~ (i +e xp(L(q,6,<?))) 2 > taking into account that 



(l+exp(Z(q 



(a,b,0))) n ^ W)W) . Then, from (5) we will have: 



a 2 />(q,6, c (q,6))(g) a 2 c(q,6) 



db 2 



db 



P-(l -P,(a,bX0)) which, together with identity (14), will get 



.... d 2 L(a,b,c(a,b)) 

us to the conclusion that the type of approximation pp is negative, so the function 

L(a,b,c(a,b )) is convex on b for fixed a . The same type of consideration can be given about 
convexity of L(a,b,c(a,b)) with respect to a for fixed b. 
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Table 1. Variances of 3PL parameters in the “Normal” simulation 





A-parameter 


B-parameter 


C-parameter 




BLG 


DMAP 


BLG 


DMAP 


BLG 


DMAP 


2000 


0.0378 


0.022 


0.023 


0.0154 


0.0007 


0.0041 


1500 


0.0231 


0.0395 


0.0231 


0.0122 


0.0004 


0.0043 


1000 


0.0308 


0.0428 


0.0237 


0.0169 


0.0005 


0.0051 


750 


0.0342 


0.0428 


0.025 


0.0262 


0.0006 


0.0067 


500 


0.0668 


0.0923 


0.0318 


0.0246 


0.0003 


0.0072 


300 


0.0942 


0.2462 


0.0428 


0.0419 


0.0006 


0.0071 
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Table 2. Average distances between ICCs 





BD3 


DMAP 


2000 


0.0221 


0.0185 


1500 


0.0227 


0.0196 


1000 


0.0235 


0.0247 


750 


0.0254 


0.0258 


500 


0.0322 


0.0292 


300 


0.0399 


0.0416 
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m 




FIGURE 1. 



The case of a test of length two, where the first item was answered correctly and the second 
wrongly. 
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FIGURE 2 

Results of AR simulation after standard Baysian implementation. 
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FIGURE 3. 



Results of AR simulation after DMAP implementation. 
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FIGURE 4. 

Variances of 3-PL parameters in the “Normal” simulation. 
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FIGURE 5. 

Average distances between ICCs. 
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"NOTABLE" POPULATI ON 

BLG DMAP 
A -0.0328 0.1652 

B 0.4321 0.0248 

C 0.0016 0.0012 



"ABLE" POPULATION 

BLG DMAP 
A 0.0211 0.1207 

B -0.3329 -0.0036 

C 0.0031 -0.0275 




FIGURE 6. 



Biases in the case of not “Normal” population. 
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“NOTABLE" POPULATION "ABLE" POPULATION 



BLG DMAP 
Avr. Dist. 0.0227 0.0196 



BLG DMAP 

Av.D2-diff. 0.0805 0.0226 



A.r.DS.I. 



Av.D2*Ji#f. 





BLG DMAP 



| O av.D2-jj»71 



FIGURE 7. 

Weighted ICCs differences in the case of not “Normal” population. 
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BLG DMAP 
IncrVr. A. 0.0461 0.0325 

IncrVr. B. 0.1392 0.0134 

IncrVr. C. 0.0005 0.0038 




FIGURE 8. 



Increases of variances in three-dimensional case. 
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BLG 


DMAP 


Easy 


0.0129 


-0.001 


Normal 


0.0336 


0.0122 


Hard 


0.0119 


0.0066 


Overall 


0.0194 


0.006 




FIGURE 9. 

Increase in distances between ICCs. 
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